Mining the WHO Drug Safety Database Using Lasso Logistic Regression

نویسنده

  • Ola Caster
چکیده

For reasons such as low incidence, occurrence in groups frequently excluded from clinical trials and long onset times, some adverse drug reactions (ADRs) of a new medicinal product stay unnoticed until after market launch. The World Health Organization (WHO) in collaboration with the Uppsala Monitoring Centre (UMC) continuously collect spontaneous ADR reports from the entire world and use data mining approaches to detect which drugs are most likely to cause which previously unanticipated ADRs. This WHO drug safety database, being the largest of its kind, comprises about 3.8 million accumulated reports. The currently used data mining methods are based on two-dimensional projections of the data with respect to a given drug-ADR combination. This combination is then given an association score based on the discrepancy between the observed and expected number of reports on it. In this thesis these disproportionality-based methods are represented by the information component (IC) measure of the UMC, a shrunk Bayesian measure. A limitation with the IC is its incapability to deal with confounding by co-medication and masking. Confounding by co-medication means that the association between a drug and a certain ADR might seem stronger than it really is because that drug is used together with another drug, which in turn is truly associated with the ADR. Masking, on the other hand, is a phenomenon whereby a very strong association between an ADR and some drug might weaken the associations between that ADR and other drugs. Here a novel method to mine the WHO drug safety database is proposed to address these issues, the lasso logistic regression (LLR). Instead of studying each combination separately, in the LLR model the ADR under study is fixed and its presence on a report is predicted by the presence of all occurring drugs in the database, thus yielding a logistic regression framework. Further, independent prior Laplace distributions are put on the parameters, resulting in a lasso-type shrinkage where a subset of the parameters are shrunk to exactly zero. The LLR was confirmed to correct for confounding by co-medication and masking in simulated scenarios and specific clinical examples. Further, with a specific degree of shrinkage the LLR had 10 % higher recall and maintained precision in comparison to the IC with respect to a test database. Although its transparency is limited, the LLR has an important role to play in the future of ADR monitoring.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Penalized Lasso Methods in Health Data: application to trauma and influenza data of Kerman

Background: Two main issues that challenge model building are number of Events Per Variable and multicollinearity among exploratory variables. Our aim is to review statistical methods that tackle these issues with emphasize on penalized Lasso regression model.  The present study aimed to explain problems of traditional regressions due to small sample size and m...

متن کامل

Detection of independent associations in a large epidemiologic dataset: a comparison of random forests, boosted regression trees, conventional and penalized logistic regression for identifying independent factors associated with H1N1pdm influenza infections

BACKGROUND Big data is steadily growing in epidemiology. We explored the performances of methods dedicated to big data analysis for detecting independent associations between exposures and a health outcome. METHODS We searched for associations between 303 covariates and influenza infection in 498 subjects (14% infected) sampled from a dedicated cohort. Independent associations were detected u...

متن کامل

Extraction of Drug Crime Patterns and Identifying People at Risk Using Data Mining Techniques

Introduction: In recent years, technology advancement and the growth of information technology in organizations have provided a huge source of data stored in the field of drug-related offenses. Analyzing these data and discovering hidden patterns in it can help detect and prevent the occurrence of crimes in this area. This paper aimed to identify the susceptible people to drug trafficking in Si...

متن کامل

Extraction of Drug Crime Patterns and Identifying People at Risk Using Data Mining Techniques

Introduction: In recent years, technology advancement and the growth of information technology in organizations have provided a huge source of data stored in the field of drug-related offenses. Analyzing these data and discovering hidden patterns in it can help detect and prevent the occurrence of crimes in this area. This paper aimed to identify the susceptible people to drug trafficking in Si...

متن کامل

Text Mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial

Analysts have estimated that more than 80 percent of today’s data is stored in unstructured form (e.g., text, audio, image, video)—much of it expressed in rich and ambiguous natural language. Traditionally, to analyze natural language, one has used qualitative data-analysis approaches, such as manual coding. Yet, the size of text data sets obtained from the Internet makes manual analysis virtua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007